local linear convergence
Local Linear Convergence of Forward--Backward under Partial Smoothness
In this paper, we consider the Forward--Backward proximal splitting algorithm to minimize the sum of two proper closed convex functions, one of which having a Lipschitz continuous gradient and the other being partly smooth relatively to an active manifold $\mathcal{M}$. We propose a generic framework in which we show that the Forward--Backward (i) correctly identifies the active manifold $\mathcal{M}$ in a finite number of iterations, and then (ii) enters a local linear convergence regime that we characterize precisely. This gives a grounded and unified explanation to the typical behaviour that has been observed numerically for many problems encompassed in our framework, including the Lasso, the group Lasso, the fused Lasso and the nuclear norm regularization to name a few. These results may have numerous applications including in signal/image processing processing, sparse recovery and machine learning.
Local Linear Convergence of Forward--Backward under Partial Smoothness
Jingwei Liang, Jalal Fadili, Gabriel Peyré
In this paper, we consider the Forward-Backward proximal splitting algorithm to minimize the sum of two proper closed convex functions, one of which having a Lipschitz continuous gradient and the other being partly smooth relative to an active manifold M. We propose a generic framework under which we show that the Forward-Backward (i) correctly identifies the active manifold M in a finite number of iterations, and then (ii) enters a local linear convergence regime that we characterize precisely. This gives a grounded and unified explanation to the typical behaviour that has been observed numerically for many problems encompassed in our framework, including the Lasso, the group Lasso, the fused Lasso and the nuclear norm regularization to name a few. These results may have numerous applications including in signal/image processing processing, sparse recovery and machine learning.
Local Linear Convergence of Forward--Backward under Partial Smoothness
In this paper, we consider the Forward--Backward proximal splitting algorithm to minimize the sum of two proper closed convex functions, one of which having a Lipschitz continuous gradient and the other being partly smooth relatively to an active manifold \mathcal{M} . We propose a generic framework in which we show that the Forward--Backward (i) correctly identifies the active manifold \mathcal{M} in a finite number of iterations, and then (ii) enters a local linear convergence regime that we characterize precisely. This gives a grounded and unified explanation to the typical behaviour that has been observed numerically for many problems encompassed in our framework, including the Lasso, the group Lasso, the fused Lasso and the nuclear norm regularization to name a few. These results may have numerous applications including in signal/image processing processing, sparse recovery and machine learning.
Local Linear Convergence of Forward-Backward under Partial Smoothness
In this paper, we consider the Forward-Backward proximal splitting algorithm to minimize the sum of two proper closed convex functions, one of which having a Lipschitz continuous gradient and the other being partly smooth relative to an active manifold M. We propose a generic framework under which we show that the Forward-Backward (i) correctly identifies the active manifold M in a finite number of iterations, and then (ii) enters a local linear convergence regime that we characterize precisely. This gives a grounded and unified explanation to the typical behaviour that has been observed numerically for many problems encompassed in our framework, including the Lasso, the group Lasso, the fused Lasso and the nuclear norm regularization to name a few. These results may have numerous applications including in signal/image processing processing, sparse recovery and machine learning.
Fast global convergence of gradient descent for low-rank matrix approximation
Chen, Hengchao, Chen, Xin, Elmasri, Mohamad, Sun, Qiang
This paper investigates gradient descent for solving low-rank matrix approximation problems. We begin by establishing the local linear convergence of gradient descent for symmetric matrix approximation. Building on this result, we prove the rapid global convergence of gradient descent, particularly when initialized with small random values. Remarkably, we show that even with moderate random initialization, which includes small random initialization as a special case, gradient descent achieves fast global convergence in scenarios where the top eigenvalues are identical. Furthermore, we extend our analysis to address asymmetric matrix approximation problems and investigate the effectiveness of a retraction-free eigenspace computation method. Numerical experiments strongly support our theory. In particular, the retraction-free algorithm outperforms the corresponding Riemannian gradient descent method, resulting in a significant 29\% reduction in runtime.
- North America > Canada > Ontario > Toronto (0.28)
- Asia > Middle East > Jordan (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Can We Find Nash Equilibria at a Linear Rate in Markov Games?
Song, Zhuoqing, Lee, Jason D., Yang, Zhuoran
We study decentralized learning in two-player zero-sum discounted Markov games where the goal is to design a policy optimization algorithm for either agent satisfying two properties. First, the player does not need to know the policy of the opponent to update its policy. Second, when both players adopt the algorithm, their joint policy converges to a Nash equilibrium of the game. To this end, we construct a meta algorithm, dubbed as $\texttt{Homotopy-PO}$, which provably finds a Nash equilibrium at a global linear rate. In particular, $\texttt{Homotopy-PO}$ interweaves two base algorithms $\texttt{Local-Fast}$ and $\texttt{Global-Slow}$ via homotopy continuation. $\texttt{Local-Fast}$ is an algorithm that enjoys local linear convergence while $\texttt{Global-Slow}$ is an algorithm that converges globally but at a slower sublinear rate. By switching between these two base algorithms, $\texttt{Global-Slow}$ essentially serves as a ``guide'' which identifies a benign neighborhood where $\texttt{Local-Fast}$ enjoys fast convergence. However, since the exact size of such a neighborhood is unknown, we apply a doubling trick to switch between these two base algorithms. The switching scheme is delicately designed so that the aggregated performance of the algorithm is driven by $\texttt{Local-Fast}$. Furthermore, we prove that $\texttt{Local-Fast}$ and $\texttt{Global-Slow}$ can both be instantiated by variants of optimistic gradient descent/ascent (OGDA) method, which is of independent interest.
- North America > United States (0.14)
- Asia > Middle East > Jordan (0.04)
- Europe > United Kingdom > England > Essex > Colchester (0.04)
Model identification and local linear convergence of coordinate descent
Klopfenstein, Quentin, Bertrand, Quentin, Gramfort, Alexandre, Salmon, Joseph, Vaiter, Samuel
For composite nonsmooth optimization problems, Forward-Backward algorithm achieves model identification (e.g. support identification for the Lasso) after a finite number of iterations, provided the objective function is regular enough. Results concerning coordinate descent are scarcer and model identification has only been shown for specific estimators, the support-vector machine for instance. In this work, we show that cyclic coordinate descent achieves model identification in finite time for a wide class of functions. In addition, we prove explicit local linear convergence rates for coordinate descent. Extensive experiments on various estimators and on real datasets demonstrate that these rates match well empirical results.
- Europe > France > Occitanie > Hérault > Montpellier (0.04)
- Europe > France > Bourgogne-Franche-Comté > Côte-d'Or > Dijon (0.04)
- North America > United States > New York (0.04)
Local Linear Convergence of Forward--Backward under Partial Smoothness
Liang, Jingwei, Fadili, Jalal, Peyré, Gabriel
In this paper, we consider the Forward--Backward proximal splitting algorithm to minimize the sum of two proper closed convex functions, one of which having a Lipschitz continuous gradient and the other being partly smooth relatively to an active manifold $\mathcal{M}$. We propose a generic framework in which we show that the Forward--Backward (i) correctly identifies the active manifold $\mathcal{M}$ in a finite number of iterations, and then (ii) enters a local linear convergence regime that we characterize precisely. This gives a grounded and unified explanation to the typical behaviour that has been observed numerically for many problems encompassed in our framework, including the Lasso, the group Lasso, the fused Lasso and the nuclear norm regularization to name a few. These results may have numerous applications including in signal/image processing processing, sparse recovery and machine learning. Papers published at the Neural Information Processing Systems Conference.
Calculus of the exponent of Kurdyka-{\L}ojasiewicz inequality and its applications to linear convergence of first-order methods
In this paper, we study the Kurdyka-{\L}ojasiewicz (KL) exponent, an important quantity for analyzing the convergence rate of first-order methods. Specifically, we develop various calculus rules to deduce the KL exponent of new (possibly nonconvex and nonsmooth) functions formed from functions with known KL exponents. In addition, we show that the well-studied Luo-Tseng error bound together with a mild assumption on the separation of stationary values implies that the KL exponent is $\frac12$. The Luo-Tseng error bound is known to hold for a large class of concrete structured optimization problems, and thus we deduce the KL exponent of a large class of functions whose exponents were previously unknown. Building upon this and the calculus rules, we are then able to show that for many convex or nonconvex optimization models for applications such as sparse recovery, their objective function's KL exponent is $\frac12$. This includes the least squares problem with smoothly clipped absolute deviation (SCAD) regularization or minimax concave penalty (MCP) regularization and the logistic regression problem with $\ell_1$ regularization. Since many existing local convergence rate analysis for first-order methods in the nonconvex scenario relies on the KL exponent, our results enable us to obtain explicit convergence rate for various first-order methods when they are applied to a large variety of practical optimization models. Finally, we further illustrate how our results can be applied to establishing local linear convergence of the proximal gradient algorithm and the inertial proximal algorithm with constant step-sizes for some specific models that arise in sparse recovery.
- Asia > China > Hong Kong (0.04)
- Oceania > Australia > New South Wales (0.04)
- North America > United States > New York (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Local Linear Convergence of Forward--Backward under Partial Smoothness
Liang, Jingwei, Fadili, Jalal, Peyré, Gabriel
In this paper, we consider the Forward--Backward proximal splitting algorithm to minimize the sum of two proper closed convex functions, one of which having a Lipschitz continuous gradient and the other being partly smooth relatively to an active manifold $\mathcal{M}$. We propose a generic framework in which we show that the Forward--Backward (i) correctly identifies the active manifold $\mathcal{M}$ in a finite number of iterations, and then (ii) enters a local linear convergence regime that we characterize precisely. This gives a grounded and unified explanation to the typical behaviour that has been observed numerically for many problems encompassed in our framework, including the Lasso, the group Lasso, the fused Lasso and the nuclear norm regularization to name a few. These results may have numerous applications including in signal/image processing processing, sparse recovery and machine learning.